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Abstract 

Internet censorship is on the rise as websites around the world are increasingly blocked by 
government-level firewalls. Although popular anonymizing networks like Tor were originally 
designed to keep attackers from tracing people’s activities, many people are also using them to 
evade local censorship. But if the censor simply denies access to the Tor network itself, blocked 
users can no longer benefit from the security Tor offers. 

Here we describe a design that builds upon the current Tor network to provide an anonymiz- 
ing network that resists blocking by government-level attackers. 


1 Introduction 

Anonymizing networks like Tor [10] bounee traffie around a network of enerypting relays. Unlike 
eneryption, whieh hides only what is said, these networks also aim to hide who is eommunieating 
with whom, whieh users are using whieh websites, and so on. These systems have a broad range 
of users, ineluding ordinary eitizens who want to avoid being profiled for targeted advertisements, 
eorporations who don’t want to reveal information to their eompetitors, and law enforeement and 
government intelligenee ageneies who need to do operations on the Internet without being notieed. 

Historieal anonymity researeh has foeused on an attaeker who monitors the user (eall her Aliee) 
and tries to diseover her aetivities, yet lets her reaeh any pieee of the network. In more modern 
threat models sueh as Tor’s, the adversary is allowed to perform aetive attaeks sueh as modifying 
communications to trick Alice into revealing her destination, or intercepting some connections to 
run a man-in-the-middle attack. But these systems still assume that Alice can eventually reach the 
anonymizing network. 

An increasing number of users are using the Tor software less for its anonymity properties than 
for its censorship resistance properties — if they use Tor to access Internet sites like Wikipedia and 
Blogspot, they are no longer affected by local censorship and firewall rules. In fact, an informal 
user study showed that a few hundred thousand users people access the Tor network each day, with 
about 20% of them coming from China. 

The current Tor design is easy to block if the attacker controls Alice’s connection to the Tor 
network — by blocking the directory authorities, by blocking all the relay IP addresses in the direc- 
tory, or by filtering based on the network fingerprint of the Tor TLS handshake. Here we describe an 
extended design that builds upon the current Tor network to provide an anonymizing network that 
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resists censorship as well as anonymity-breaking attacks. In section 2 we discuss our threat model — 
that is, the assumptions we make about our adversary. Section 3 describes the components of the 
current Tor design and how they can be leveraged for a new blocking-resistant design. Section 4 
explains the features and drawbacks of the currently deployed solutions. In sections 5 through 7, we 
explore the components of our designs in detail. Section 8 considers security implications and Sec- 
tion 9 presents other issues with maintaining connectivity and sustainability for the design. Finally 
section 10 summarizes our next steps and recommendations. 

2 Adversary assumptions 

To design an effective anti-censorship tool, we need a good model for the goals and resources of 
the censors we are evading. Otherwise, we risk spending our effort on keeping the adversaries from 
doing things they have no interest in doing, and thwarting techniques they do not use. The history of 
blocking-resistance designs is littered with conflicting assumptions about what adversaries to expect 
and what problems are in the critical path to a solution. Here we describe our best understanding of 
the current situation around the world. 

In the traditional security style, we aim to defeat a strong attacker — if we can defend against this 
attacker, we inherit protection against weaker attackers as well. After all, we want a general design 
that will work for citizens of China, Thailand, and other censored countries; for whistleblowers in 
firewalled corporate networks; and for people in unanticipated oppressive situations. In fact, by 
designing with a variety of adversaries in mind, we can take advantage of the fact that adversaries 
will be in different stages of the arms race at each location, so an address blocked in one locale can 
still be useful in others. We focus on an attacker with somewhat complex goals: 

• The attacker would like to restrict the flow of certain kinds of information, particularly when 
this information is seen as embarrassing to those in power (such as information about rights 
violations or corruption), or when it enables or encourages others to oppose them effectively 
(such as information about opposition movements or sites that are used to organize protests). 

• As a second-order effect, censors aim to chill citizens’ behavior by creating an impression 
that their online activities are monitored. 

• In some cases, censors make a token attempt to block a few sites for obscenity, blasphemy, 
and so on, but their efforts here are mainly for show. In other cases, they really do try hard to 
block such content. 

• Complete blocking (where nobody at all can ever download censored content) is not a goal. 
Attackers typically recognize that perfect censorship is not only impossible, it is unnecessary: 
if “undesirable” information is known only to a small few, further censoring efforts can be 
focused elsewhere. 

• Similarly, the censors do not attempt to shut down or block every anti-censorship tool — 
merely the tools that are popular and effective (because these tools impede the censors’ infor- 
mation restriction goals) and those tools that are highly visible (thus making the censors look 
ineffectual to their citizens and their bosses). 

• Reprisal against most passive consumers of most kinds of blocked information is also not a 
goal, given the broadness of most censorship regimes. This seems borne out by fact.^ 

' So far in places like China, the authorities mainly go after people who publish materials and coordinate organized 
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• Producers and distributors of targeted information are in much greater danger than consumers; 
the attacker would like to not only block their work, but identify them for reprisal. 

• The censors (or their governments) would like to have a working, useful Internet. There are 
economic, political, and social factors that prevent them from “censoring” the Internet by 
outlawing it entirely, or by blocking access to all but a tiny list of sites. Nevertheless, the 
censors are willing to block innocuous content (like the bulk of a newspaper’s reporting) in 
order to censor other content distributed through the same channels (like that newspaper’s 
coverage of the censored country). 

We assume there are three main technical network attacks in use by censors currently [6] : 

• Block a destination or type of traffic by automatically searching for certain strings or patterns 
in TCP packets. Offending packets can be dropped, or can trigger a response like closing the 
connection. 

• Block certain IP addresses or destination ports at a firewall or ofher roufing confrol poinf. 

• Infercepf DNS requesfs and give bogus responses for cerfain destination hosfnames. 

We assume fhe nefwork firewall has limited CPU and memory per connection [6]. Againsf an 
adversary who could carefully examine fhe confenfs of every packef and correlate fhe packefs in 
every sfream on fhe nefwork, we would need some sfronger mechanism such as sfeganography, 
which infroduces ifs own problems [14, 25]. Buf we make a “weak sfeganography” assumption 
here: fo remain unblocked, if is necessary fo remain unobservable only by compufafional resources 
on par wifh a modern roufer, firewall, proxy, or IDS. 

We assume fhaf while various differenl regimes can coordinate and share nofes, fhere will be a 
time lag befween one affacker learning how fo overcome a facel of our design and ofher affackers 
picking if up. (The mosf common vecfor of fransmission seems fo be commercial providers of 
censorship fools: once a provider adds a fealure fo meef one counfry’s needs or requesfs, fhe feafure 
is available fo all of fhe provider’s customers.) Conversely, we assume fhaf insider attacks become 
a higher risk only afler fhe early sfages of nefwork developmenf, once fhe sysfem has reached a 
cerfain level of success and visibilify. 

We do nol assume fhaf governmenf-level attackers are always uniform across fhe counfry. For 
example, users of differenl ISPs in China experience differenl censorship policies and mechanisms. 

We assume fhaf fhe attacker may be able fo use polilical and economic resources fo secure fhe 
cooperalion of exlralerrilorial or mullinalional corporations and enlilies in invesligaling informalion 
sources. For example, fhe censors can Ihrealen fhe service providers of Iroublesome blogs wifh 
economic reprisals if Ihey do nol reveal fhe aulhors’ identities. 

We assume fhaf our users have confrol over Iheir hardware and soflware — Ihey don’l have any 
spyware inslalled, fhere are no cameras walching Iheir screens, efc. Unforlunalely, in many silua- 
lions Ihese Ihreafs are real [27] ; yel soflware -based securily syslems like ours are poorly equipped 
to handle a user who is entirely observed and conlrolled by fhe adversary. See Section 8.4 for more 
discussion of whal little we can do aboul Ihis issue. 

movements [21]. If they find that a user happens to be reading a site that should be blocked, the typical response is simply 
to block the site. Of course, even with an encrypted connection, the adversary may be able to distinguish readers from 
publishers by observing whether Alice is mostly downloading bytes or mostly uploading them — we discuss this issue 
more in Section 8.2. 
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Similarly, we assume that the user will be able to feteh a genuine version of Tor, rather than one 
supplied by the adversary; see Seetion 8.5 for diseussion on helping the user eonfirm that he has a 
genuine version and that he ean eonneet to the real Tor network. 

3 Adapting the current Tor design to anti-censorship 

Tor is popular and sees a lot of use — it’s the largest anonymity network of its kind, and has attraeted 
more than 1500 volunteer-operated routers from around the world. Tor proteets eaeh user by routing 
their traffie through a multiply enerypted “eireuit” built of a few randomly seleeted relay, eaeh of 
whieh ean remove only a single layer of eneryption. Eaeh relay sees only the step before it and 
the step after it in the eireuit, and so no single relay ean learn the eonneetion between a user and 
her ehosen eommunieation partners. In this seetion, we examine some of the reasons why Tor has 
beeome popular, with partieular emphasis to how we ean take advantage of these properties for a 
bloeking-resistanee design. 

Tor aims to provide three seeurity properties: 

• 1. A loeal network attaeker ean’t learn, or influenee, your destination. 

• 2. No single router in the Tor network ean link you to your destination. 

• 3. The destination, or somebody watehing the destination, ean’t learn your loeation. 

For bloeking-resistanee, we eare most elearly about the first property. But as the arms raee 
progresses, the seeond property will beeome important — for example, to diseourage an adversary 
from volunteering a relay in order to learn that Aliee is reading or posting to eertain websites. 
The third property helps keep users safe from eollaborating websites: eonsider websites and other 
Internet serviees that have been pressured reeently into revealing the identity of bloggers or treating 
elients differently depending on their network loeation [16]. 

The Tor design provides other features as well that are not typieally present in manual or ad hoe 
eireumvention teehniques. 

First, Tor has a well-analyzed and well-understood way to distribute information about relay. Tor 
direetory authorities automatieally aggregate, test, and publish signed summaries of the available 
Tor routers. Tor elients ean feteh these summaries to learn whieh routers are available and whieh 
routers are suitable for their needs. Direetory information is eaehed throughout the Tor network, so 
onee elients have bootstrapped they never need to internet with the authorities direetly. (To tolerate 
a minority of eompromised direetory authorities, we use a threshold trust seheme — see Seetion 8.5 
for details.) 

Seeond, the list of direetory authorities is not hard- wired. Clients use the default authorities if no 
others are speeified, but it’s easy to start a separate (or even overlapping) Tor network just by running 
a different set of authorities and eonvineing users to prefer a modified elient. For example, we eould 
launeh a disfinet Tor network inside China; some users eould even use an aggregate network made 
up of both the main network and the China network. (But we should not be too quiek to ereate other 
Tor networks — part of Tor’s anonymity eomes from users behaving like other users, and there are 
many unsolved anonymity questions if different users know about different pieees of the network.) 

Third, in addition to automatieally learning from the ehosen direetories whieh Tor routers are 
available and working. Tor takes eare of building paths through the network and rebuilding them as 
needed. So the user never has to know how paths are ehosen, never has to manually piek working 
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proxies, and so on. More generally, at its core the Tor protocol is simply a tool that can build paths 
given a set of routers. Tor is quite flexible about how it learns about the routers and how it chooses 
the paths. Harvard’s Blossom project [15] makes this flexibility more concrete: Blossom makes 
use of Tor not for its security properties but for its reachability properties. It runs a separate set of 
directory authorities, its own set of Tor routers (called the Blossom network), and uses Tor’s flexible 
path-building to let users view Internet resources from any point in the Blossom network. 

Fourth, Tor separates the role of internal relay from the role of exit relay. That is, some volun- 
teers choose just to relay traffic between Tor users and Tor routers, and others choose to also allow 
connections to external Internet resources. Because we don’t force all volunteers to play both roles, 
we end up with more relays. This increased diversity in turn is what gives Tor its security: the more 
options the user has for her first hop, and the more options she has for her last hop, the less likely 
it is that a given attacker will be watching both ends of her circuit [10]. As a bonus, because our 
design attracts more internal relays that want to help out but don’t want to deal with being an exit 
relay, we end up providing more options for the first hop — the one most critical to being able to 
reach the Tor network. 

Fifth, Tor is sustainable. Zero-Knowledge Systems offered the commercial but now defunct 
Freedom Network [2], a design with security comparable to Tor’s, but its funding model relied on 
collecting money from users to pay relay operators. Modern commercial proxy systems similarly 
need to keep collecting money to support their infrastructure. On the other hand. Tor has built a self- 
sustaining community of volunteers who donate their time and resources. This community trust is 
rooted in Tor’s open design: we tell the world exactly how Tor works, and we provide all the source 
code. Users can decide for themselves, or pay any security expert to decide, whether it is safe to 
use. Further, Tor’s modularity as described above, along with its open license, mean that its impact 
will continue to grow. 

Sixth, Tor has an established user base of hundreds of thousands of people from around the 
world. This diversity of users contributes to sustainability as above: Tor is used by ordinary citizens, 
activists, corporations, law enforcement, and even government and military users, and they can only 
achieve their security goals by blending together in the same network [1, 8]. This user base also 
provides something else: hundreds of thousands of different and often-changing addresses that we 
can leverage for our blocking-resistance design. 

Finally and perhaps most importantly. Tor provides anonymity and prevents any single relay 
from linking users to their communication partners. Despite initial appearances, distributed-trust 
anonymity is critical for anti-censorship efforts. If any single relay can expose dissident bloggers 
or compile a list of users’ behavior, the censors can profitably compromise that relay’s operator, 
perhaps by applying economic pressure to their employers, breaking into their computer, pressuring 
their family (if they have relatives in the censored area), or so on. Furthermore, in designs where 
any relay can expose its users, the censors can spread suspicion that they are running some of the 
relays and use this belief to chill use of the network. 

We discuss and adapt these components further in Section 5. But first we examine the strengths 
and weaknesses of other blocking-resistance approaches, so we can expand our repertoire of build- 
ing blocks and ideas. 
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4 Current proxy solutions 


Relay -based bloeking-resistanee sehemes generally have two main eomponents: a relay eomponent 
and a diseovery component. The relay part encompasses the process of establishing a connection, 
sending traffic back and forth, and so on — everything that’s done once the user knows where she’s 
going to connect. Discovery is the step before that: the process of finding one or more usable relays. 

For example, we can divide the pieces of Tor in the previous section into the process of building 
paths and sending traffic over fhem (relay) and fhe process of learning from fhe direcfory aufhorifies 
abouf whaf roufers are available (discovery). Wifh fhis disfincfion in mind, we now examine several 
cafegories of relay-based schemes. 

4.1 Centrally-controlled shared proxies 

Exisfing commercial anonymity solufions (like Anonymizer.com) are based on a sef of single-hop 
proxies. In fhese sysfems, each user connecfs to a single proxy, which then relays traffic between the 
user and her destination. These public proxy systems are typically characterized by two features: 
they control and operate the proxies centrally, and many different users get assigned to each proxy. 

In terms of the relay component, single proxies provide weak security compared to systems 
that distribute trust over multiple relays, since a compromised proxy can trivially observe all of its 
users’ actions, and an eavesdropper only needs to watch a single proxy to perform timing correlation 
attacks against all its users’ traffic and thus learn where everyone is connecting. Worse, all users 
need to trust the proxy company to have good security itself as well as to not reveal user activities. 

On the other hand, single-hop proxies are easier to deploy, and they can provide better perfor- 
mance than distributed- trust designs like Tor, since traffic only goes through one relay. They’re also 
more convenient from the user’s perspective — since users entirely trust the proxy, they can just use 
their web browser directly. 

Whether public proxy schemes are more or less scalable than Tor is still up for debate: commer- 
cial anonymity systems can use some of their revenue to provision more bandwidth as they grow, 
whereas volunteer-based anonymity systems can attract thousands of fast relays to spread the load. 

The discovery piece can take several forms. Most commercial anonymous proxies have one or 
a handful of commonly known websites, and their users log in to those websites and relay their 
traffic through them. When these websites get blocked (generally soon after the company becomes 
popular), if the company cares about users in the blocked areas, they start renting lots of disparate IP 
addresses and rotating through them as they get blocked. They notify their users of new addresses 
(by email, for example). It’s an arms race, since attackers can sign up to receive the email too, but 
operators have one nice trick available to them: because they have a list of paying subscribers, they 
can notify certain subscribers about updates earlier than others. 

Access control systems on the proxy let them provide service only to users with certain charac- 
teristics, such as paying customers or people from certain IP address ranges. 

Discovery in the face of a government-level firewall is a complex and unsolved topic, and we’re 
stuck in this same arms race ourselves; we explore it in more detail in Section 7. But first we 
examine the other end of the spectrum — getting volunteers to run the proxies, and telling only a few 
people about each proxy. 
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4.2 Independent personal proxies 

Personal proxies such as Circumventor [17] and CGIProxy [22] use the same technology as the 
public ones as far as the relay component goes, but they use a different strategy for discovery. 
Rather than managing a few centralized proxies and constantly getting new addresses for them as 
the old addresses are blocked, they aim to have a large number of entirely independent proxies, each 
managing its own (much smaller) set of users. 

As the Circumventor site explains, “You don’t actually install the Circumventor on the computer 
that is blocked from accessing Web sites. You, or a friend of yours, has to install the Circumventor 
on some other machine which is not censored.” 

This tactic has great advantages in terms of blocking-resistance — ^recall our assumption in Sec- 
tion 2 that the attention a system attracts from the attacker is proportional to its number of users and 
level of publicity. If each proxy only has a few users, and there is no central list of proxies, most of 
them will never get noticed by the censors. 

On the other hand, there’s a huge scalability question that so far has prevented these schemes 
from being widely useful: how does the fellow in China find a person in Ohio who will run a 
Circumventor for him? In some cases he may know and trust some people on the outside, but in 
many cases he’s just out of luck. Just as hard, how does a new volunteer in Ohio find a person in 
China who needs if? 

This challenge leads fo a hybrid design — cenfrally-dislribufed personal proxies — which we will 
invesfigafe in more defail in Secfion 7. 

4.3 Open proxies 

Yef anofher currenfly used approach fo bypassing firewalls is fo locale open and misconfigured 
proxies on Ihe Inlernel. A quick Google search for “open proxy lisf” yields a wide variefy of freely 
available lisfs of HTTP, HTTPS, and SOCKS proxies. Many small companies have sprung up 
providing more refined lisfs fo paying cuslomers. 

There are some downsides fo using Ihese open proxies Ihough. Firsl, Ihe proxies are of widely 
varying qualify in terms of bandwidlh and slabilily, and many of Ihem are enlirely unreachable. 
Second, unlike nefworks of volunfeers like Tor, Ihe legalily of roufing fraffic fhrough Ihese proxies is 
quesfionable: if’s widely believed fhaf mosf of fhem don’f realize whaf fhey ’re offering, and probably 
wouldn’f allow if if fhey realized. Third, in many cases Ihe connecfion fo Ihe proxy is unencrypted, 
so firewalls thaf filler based on keywords in IP packels will nol be hindered. Fourlh, in many 
counlries (including China), Ihe firewall aulhorilies hunl for open proxies as well, fo preemplively 
block fhem. And Iasi, many users are suspicious lhal some open proxies are a lilfle too convenienl: 
are fhey run by Ihe adversary, in which case fhey gel fo monilor all Ihe user’s requesls jusl as single- 
hop proxies can? 

A dislribuled-lrusl design like Tor resolves each of Ihese issues for Ihe relay componenl, bul a 
consfanlly changing sel of Ihousands of open relays is clearly a useful idea for a discovery com- 
ponenl. For example, users mighl be able fo make use of Ihese proxies fo boofslrap Iheir firsl 
inlroduclion info Ihe Tor nelwork. 
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4.4 Blocking resistance and JAP 


Kopsell and Hilling’s Blocking Resistance design [19] is probably the closest related work, and is 
the starting point for the design in this paper. In this design, the JAP anonymity system [3] is used 
as a base instead of Tor. Volunteers operate a large number of access points that relay traffic to 
the core JAP network, which in turn anonymizes users’ traffic. The soffware fo run fhese relays 
is, as in our design, included in fhe JAP clienf soffware and enabled only when fhe user decides fo 
enable if. Discovery is handled wifh a CAPTCHA-based mechanism; users prove fhaf fhey aren’f 
an aufomafed process, and are given fhe address of an access poinf. (The problem of a defermined 
attacker wifh enough manpower fo launch many requesfs and enumerafe all fhe access poinfs is nol 
considered in depfh.) There is also some suggestion fhaf informafion abouf access poinfs could 
spread fhrough exisfing social nefworks. 

4.5 Infranet 

The Infranef design [13] uses one-hop relays fo deliver web confenf, buf disguises ifs communi- 
cafions as ordinary HTTP Iraffic. Requesfs are splif info multiple requesfs for URLs on fhe relay, 
which fhen encodes ifs responses in fhe confenf if refums. The relay needs fo be an acfual websife 
wifh plausible confenf and a number of URLs which fhe user mighf wanf fo access — if fhe Infranef 
soffware produced ifs own cover confenf, if would be far easier for censors fo identify. To keep 
fhe censors from noticing fhaf cover confenf changes depending on whaf dafa is embedded, Infranef 
needs fhe cover confenf fo have an innocuous reason for changing frequenfly : fhe paper recommends 
wafermarked images and webcams. 

The affacker and relay operafors in Infranef’s fhreaf model are significanfly differenf fhan in ours. 
Unlike our attacker, Infranef’s censor can’f be bypassed wifh encrypted Iraffic (presumably because 
fhe censor blocks encrypfed Iraffic, or af leasf considers if suspicious), and has more compufafional 
resources fo devofe fo each connecfion fhan ours (so if can notice subfle pafferns over fime). Unlike 
our bridge operafors, Infranef’s operafors (and users) have more bandwidfh fo spare; fhe overhead 
in fypical sfeganography schemes is far higher fhan Tor’s. 

The Infranef design does nof include a discovery elemenf. Discovery, however, is a crifical 
poinf: if whafever mechanism allows users fo learn abouf relays also allows fhe censor fo do so, 
he can frivially discover and block fheir addresses, even if fhe sfeganography would prevenf mere 
Iraffic observation from revealing fhe relays’ addresses. 

4.6 RST-evasion and other packet-level tricks 

In fheir analysis of China’s firewall’s confenf-based blocking, Clayfon, Murdoch and Wafson dis- 
covered fhaf rafher fhan blocking all packefs in a TCP sfreams once a forbidden word was noficed, 
fhe firewall was simply forging RST packefs fo make fhe communicafing parfies believe fhaf fhe 
connecfion was closed [6]. They proposed altering operating sysfems fo ignore forged RST packefs. 
This approach mighf work in some cases, buf in pracfice if appears fhaf many firewalls sfarf filtering 
by IP address once a sufficienl number of RST packefs have been senf. 

Ofher packef-level responses fo tittering include splitting sensifive words across multiple TCP 
packefs, so fhaf fhe censors’ firewalls can’f nofice fhem wifhouf performing expensive sfream recon- 
sfrucfion [26]. This fechnique relies on fhe same insighf as our weak sfeganography assumpfion. 



4.7 Tor itself 

And last, we include Tor itself in the list of current solutions to firewalls. Tens of thousands of 
people use Tor from countries that routinely filter their Internet. Tor’s website has been blocked in 
most of them. But why hasn’t the Tor network been blocked yet? 

We have several theories. The first is the most straightforward: tens of thousands of people are 
simply too few to matter. It may help that Tor is perceived to be for experts only, and thus not worth 
attention yet. The more subtle variant on this theory is that we’ve positioned Tor in the public eye as 
a tool for retaining civil liberties in more free countries, so perhaps blocking authorities don’t view 
it as a threat. (We revisit this idea when we consider whether and how to publicize a Tor variant that 
improves blocking-resistance — see Section 9.5 for more discussion.) 

The broader explanation is that the maintenance of most government-level filters is aimed at 
stopping widespread information flow and appearing to be in control, not by the impossible goal 
of blocking all possible ways to bypass censorship. Censors realize that there will always be ways 
for a few people to get around the firewall, and as long as Tor has nol publically fhreafened fheir 
confrol, fhey see no urgenf need fo block if yef. 

We should recognize fhaf we’re already in fhe arms race. These consfrainfs can give us insighf 
info fhe priorities and capabilifies of our various affackers. 

5 The relay component of our blocking-resistant design 

Secfion 3 describes many reasons why Tor is well-suifed as a building block in our confexf, buf 
several changes will allow fhe design fo resisf blocking heller. The mosl crilical changes are lo gel 
more relay addresses, and fo dislribule Ihem lo users differenlly. 

5.1 Bridge relays 

Today, Tor relays operate on a few Ihousand dislincl IP addresses; an adversary could enumerate 
and block Ihem all wilh little Irouble. To provide a means of ingress lo fhe nelwork, we need a larger 
sel of enlry poinls, mosl of which an adversary won’l be able lo enumerate easily. Forlunalely, we 
have such a sel: Ihe Tor users. 

Hundreds of Ihousands of people around Ihe world use Tor. We can leverage our already self- 
selected user base lo produce a lisl of Ihousands of frequenlly-changing IP addresses. Specifically, 
we can give Ihem a little button in Ihe GUI lhal says “Tor for Freedom”, and users who click Ihe bul- 
lon will lurn into bridge relays (or jusl bridges for shorl). They can rate limil relayed connections to 
10 KB/s (almosl nolhing for a broadband user in a free counlry, bul plenty for a user who olherwise 
has no access al all), and since Ihey are jusl relaying bytes back and forlh belween blocked users 
and Ihe main Tor nelwork, Ihey won’l need to make any external connections to Inlernel sites. Be- 
cause of Ihis separation of roles, and because we’re making use of software lhal Ihe volunteers have 
already inslalled for Iheir own use, we expecl our scheme to allracl and mainlain more volunteers 
lhan previous schemes. 

As usual, Ihere are new anonymity and security implications from running a bridge relay, par- 
ticularly from letting people relay Iraffic Ihrough your Tor clienl; bul we leave Ihis discussion for 
Section 8. 
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5.2 The bridge directory authority 

How do the bridge relays advertise their existenee to the world? We introduee a seeond new eom- 
ponent of the design: a speeialized direetory authority that aggregates and traeks bridges. Bridge 
relays periodieally publish relay descriptors (summaries of their keys, locations, etc, signed by their 
long-term identity key), just like the relays in the “main” Tor network, but in this case they publish 
them only to the bridge directory authorities. 

The main difference between bridge authorities and the directory authorities for the main Tor 
network is that the main authorities provide a list of every known relay, but the bridge authorities 
only give out a relay descriptor if you already know its identity key. That is, you can keep up-to-date 
on a bridge’s location and other information once you know about it, but you can’t just grab a list 
of all the bridges. 

The identity key, IP address, and directory port for each bridge authority ship by default with the 
Tor software, so the bridge relays can be confident they’re publishing to the right location, and the 
blocked users can establish an encrypted authenticated channel. See Section 8.5 for more discussion 
of the public key infrastructure and trust chain. 

Bridges use Tor to publish their descriptors privately and securely, so even an attacker moni- 
toring the bridge directory authority’s network can’t make a list of all the addresses contacting the 
authority. Bridges may publish to only a subset of the authorities, to limit the potential impact of an 
authority compromise. 

5.3 Putting them together 

If a blocked user knows the identity keys of a set of bridge relays, and he has correct address 
information for at least one of them, he can use that one to make a secure connection to the bridge 
authority and update his knowledge about the other bridge relays. He can also use it to make secure 
connections to the main Tor network and directory authorities, so he can build circuits and connect 
to the rest of the Internet. All of these updates happen in the background: from the blocked user’s 
perspective, he just accesses the Internet via his Tor client like always. 

So now we’ve reduced the problem from how to circumvent the firewall for all fransacfions (and 
how fo know fhaf fhe pages you gel have nol been modified by Ihe local allacker) lo how lo learn 
aboul a working bridge relay. 

There’s anolher calch Ihough. We need lo make sure lhal Ihe nelwork Iraffic we generate by 
simply connecling lo a bridge relay doesn’l sland oul loo much. 

6 Hiding Tor’s network fingerprint 

Currenlly, Tor uses Iwo protocols for ils nelwork communications. The main protocol uses TLS 
for encrypted and authenticated communication between Tor instances. The second protocol is 
standard HTTP, used for fetching directory information. All Tor relays listen on their “ORPort” 
for TLS connections, and some of them opt to listen on their “DirPort” as well, to serve directory 
information. Tor relays choose whatever port numbers they like; the relay descriptor they publish 
to the directory tells users where to connect. 

One format for communicating address information about a bridge relay is its IP address and 
DirPort. From there, the user can ask the bridge’s directory cache for an up-to-date copy of its relay 
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descriptor, and learn its current circuit keys, its ORPort, and so on. 

However, connecting directly to the directory cache involves a plaintext HTTP request. A censor 
could create a network fingerprint (known as a signature in the intrusion detection field) for fhe 
requesf and/or ifs response, fhus prevenfing fhese connecfions. To resolve fhis vulnerabilify, we’ve 
modified fhe Tor protocol so fhaf users can conned fo fhe direcfory cache via fhe main Tor porf — 
fhey esfablish a TLS connecfion wifh fhe bridge as normal, and fhen send a special “begindir” relay 
command to establish an internal connection to its directory cache. 

Therefore a better way to summarize a bridge’s address is by its IP address and ORPort, so all 
communications between the client and the bridge will use ordinary TLS. But there are other details 
that need more investigation. 

What port should bridges pick for their ORPort? We currently recommend that they listen on 
port 443 (the default HTTPS port) if they want to be most useful, because clients behind standard 
firewalls will have fhe besf chance fo reach fhem. Is fhis fhe besf choice in all cases, or should we 
encourage some fraclion of fhem pick random porfs, or ofher porfs commonly permiffed fhrough 
firewalls like 53 (DNS) or 110 (POP)? Or perhaps we should use ofher porfs where TLS fraffic is 
expecfed, like 993 (IMAPS) or 995 (POP3S). We need more research on our pofenfial users, and 
fheir currenf and anficipafed firewall resfricfions. 

Furfhermore, we need to look af fhe specifics of Tor’s TLS handshake. Righf now Tor uses some 
predicfable sfrings in ifs TLS handshakes. For example, if sefs fhe X.509 organizafionName field fo 
“Tor”, and if puls fhe Tor relay’s nickname in fhe cerlificale’s commonName field. We should fweak 
fhe handshake profocol so if doesn’f rely on any unusual defails in fhe cerlificale, yef if remains 
secure; fhe cerlificale ilself should be made to resemble an ordinary HTTPS cerlificafe. We should 
also fry fo make our adverlised cipher-suiles closer fo whaf an ordinary web server would supporl. 

Tor’s TLS handshake uses Iwo-cerlificafe chains: one cerlificale conlains fhe self-signed idenlily 
key for fhe rouler, and fhe second conlains a currenf TLS key, signed by fhe idenlily key. We 
use fhese fo aulhenlicale lhal we’re lalking fo fhe righf rouler, and to limil fhe impacl of TLS-key 
exposure. Mosl (Ihough far from all) consumer-orienfed HTTPS services provide only a single 
cerlificale. These exlra cerlificales may help idenlify Tor’s TLS handshake; inslead, bridges should 
consider using only a single TLS key cerlificafe signed by fheir idenlily key, and providing fhe full 
value of fhe idenlily key in an early handshake cell. More significanlly. Tor currenlly has all clienls 
presenl cerlificales, so lhal clienls are harder fo distinguish from relays. Buf in a blocking-resislance 
environmenl, clienls should nol presenl cerlificales al all. 

Lasl, whal if Ihe adversary slarls observing Ihe nelwork Iraffic even more closely? Even if our 
TLS handshake looks innocenl, our Iraffic timing and volume still look differenl lhan a user making 
a secure web connection to his bank. The same techniques used in Ihe growing Irend to build tools 
to recognize encrypted Billorrenl Iraffic could be used to identify Tor communication and recognize 
bridge relays. Ralher lhan frying to look like encrypted web Iraffic, we may be heller off frying 
to blend wilh some olher encrypted nelwork protocol. The firsl step is to compare typical nelwork 
behavior for a Tor clienl to typical nelwork behavior for various olher protocols. This slalislical 
cal-and-mouse game is made more complex by Ihe facl lhal Tor Iransporls a variety of protocols, 
and we’ll wanl to automatically handle web browsing differenlly from, say, inslanl messaging. 
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6.1 Identity keys as part of addressing information 

We have deseribed a way for the bloeked user to bootstrap into the network onee he knows the IP 
address and ORPort of a bridge. What about loeal spoofing attaeks? That is, sinee we never learned 
an identity key fingerprint for the bridge, a loeal attaeker eould intereept our eonneetion and pretend 
to be the bridge we had in mind. It turns out that giving false information isn’t that bad — sinee the 
Tor elient ships with trusted keys for the bridge direetory authority and the Tor network direetory 
authorities, the user ean learn whether he’s being given a real eonneetion to the bridge authorities 
or not. (After all, if the adversary intereepts every eonneetion the user makes and gives him a bad 
eonneetion eaeh time, there’s nothing we ean do.) 

What about anonymity-breaking attaeks from observing traffie, if the bloeked user doesn’t start 
out knowing the identity key of his intended bridge? The vulnerabilities aren’t so bad in this ease 
either — the adversary eould do similar attaeks just by monitoring the network traffie. 

Onee fhe Tor elienf has fefehed fhe bridge’s relay deseripfor, if should remember fhe idenfify 
key fingerprinl for fhaf bridge relay. Thus if fhe bridge relay moves fo a new IP address, fhe elienf 
ean query fhe bridge direefory aufhorify fo look up a fresh relay deseripfor using fhis fingerprinf. 

So we’ve shown fhaf if’s possible fo boofsfrap info fhe nefwork jusf by learning fhe IP address 
and ORPorf of a bridge, buf are fhere sifuafions where if’s more eonvenienf or more seeure fo learn 
fhe bridge’s idenfify fingerprinf as well as insfead, while boofsfrapping? We keep fhaf question in 
mind as we nexf invesfigafe boofslrapping and diseovery. 

7 Discovering working bridge relays 

Tor’s modular design means fhaf we ean develop a heifer relay eomponenf independenlly of devel- 
oping fhe diseovery eomponenf. This modularily’s greal promise is fhaf we ean piek any diseovery 
approaeh we like; buf fhe unforlunale fael is fhaf we have no magie bullel for diseovery. We’re in 
fhe same arms raee as all fhe olher designs we deseribed in Seelion 4. 

In fhis seelion we deseribe a variety of approaehes fo adding diseovery eomponenls for our 
design. 

7.1 Bootstrapping: finding your first bridge. 

In Seelion 5.3, we showed fhaf a user who knows a working bridge address ean use if fo reaeh fhe 
bridge aufhorify and fo slay eonneeled fo fhe Tor nefwork. Buf how do new users reaeh fhe bridge 
aufhorify in fhe firsl plaee? Afler all, fhe bridge aufhorify will be one of fhe firsl addresses fhaf a 
eensor bloeks. 

Firsl, we should reeognize fhaf mosl governmenl firewalls are nol perfeef. Thai is, Ihey may 
allow eonneefions fo Google eaehe or some open proxy servers, or Ihey lei file-sharing Iraffie, 
Skype, inslanf messaging, or World-of-Warerafl eonneefions Ihrough. Differenl users will have 
differenl meehanisms for bypassing fhe firewall initially. Seeond, we should remember fhaf mosl 
people don’l operale in a vaeuum; users will hopefully know olher people who are in olher sifuafions 
or have olher resourees available. In fhe resl of fhis seelion we develop a loolkil of differenl oplions 
and meehanisms, so fhaf we ean enable users in a diverse sel of eonlexls fo boolslrap info fhe syslem. 

(For users who ean’l use any of Ihese leehniques, hopefully Ihey know a friend who ean — for 
example, perhaps fhe friend already knows some bridge relay addresses. If Ihey ean’l gel around if al 
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all, then we ean’t help them — they should go meet more people or learn more about the teehnology 
running the firewall in their area.) 

By deploying all the sehemes in the toolkit at onee, we let bridges and bloeked users employ the 
diseovery approaeh that is most appropriate for their situation. 

7.2 Independent bridges, no central discovery 

The first design is simply to have no eentralized diseovery eomponent at all. Volunteers run bridges, 
and we assume they have some bloeked users in mind and eommunieate their address information 
to them out-of-band (for example, through Gmail). This design allows for small personal bridges 
that have only one or a handful of users in mind, but it ean also support an entire eommunity of 
users. For example. Citizen Lab’s upeoming Psiphon single-hop proxy tool [12] plans to use this 
social network approaeh as its diseovery eomponent. 

There are several ways to do bootstrapping in this design. In the simple ease, the operator of 
the bridge informs eaeh ehosen user about his bridge’s address information and/or keys. A different 
approaeh involves bloeked users introdueing new bloeked users to the bridges they know. That is, 
somebody in the bloeked area ean pass along a bridge’s address to somebody else they trust. This 
seheme brings in appealing but eomplex game theoretie properties: the bloeked user making the 
deeision has an ineentive only to delegate to trustworthy people, sinee an adversary who learns the 
bridge’s address and filters it makes it unavailable for both of them. Also, delegating known bridges 
to members of your soeial network ean be dangerous: an the adversary who ean learn who knows 
whieh bridges may be able to reeonstruet the soeial network. 

Note that a eentral set of bridge direetory authorities ean still be eompatible with a deeentralized 
diseovery proeess. That is, how users first learn about bridges is entirely up to the bridges, but the 
process of fetching up-to-date descriptors for them can still proceed as described in Section 5. Of 
course, creating a central place that knows about all the bridges may not be smart, especially if every 
other piece of the system is decentralized. Further, if a user only knows about one bridge and he 
loses track of it, it may be quite a hassle to reach the bridge authority. We address these concerns 
next. 


7.3 Families of bridges, no central discovery 

Because the blocked users are running our software too, we have many opportunities to improve 
usability or robustness. Our second design builds on the first by encouraging volunteers to run 
several bridges at once (or coordinate with other bridge volunteers), such that some of the bridges 
are likely to be available at any given time. 

The blocked user’s Tor client would periodically fetch an updated set of recommended bridges 
from any of the working bridges. Now the client can learn new additions to the bridge pool, and can 
expire abandoned bridges or bridges that the adversary has blocked, without the user ever needing to 
care. To simplify maintenance of the community’s bridge pool, each community could run its own 
bridge directory authority — ^reachable via the available bridges, and also mirrored at each bridge. 

7.4 Public bridges with central discovery 

What about people who want to volunteer as bridges but don’t know any suitable blocked users? 
What about people who are blocked but don’t know anybody on the outside? Here we describe how 
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to make use of these public bridges in a way that still makes it hard for the attaeker to learn all of 
them. 

The basie idea is to divide publie bridges into a set of pools based on identity key. Eaeh pool 
eorresponds to a distribution strategy: an approaeh to distributing its bridge addresses to users. Eaeh 
strategy is designed to exereise a different searee resouree or property of the user. 

How do we divide bridges between these strategy pools sueh that they’re evenly distributed and 
the alloeation is hard to influenee or prediet, but also in a way that’s amenable to ereating more 
strategies later on without reshuffling all the pools? We assign a given bridge to a strategy pool by 
hashing the bridge’s identity key along with a seeret that only the bridge authority knows: the first 
n bits of this hash dietate the strategy pool number, where n is a parameter that deseribes how many 
strategy pools we want at this point. We ehoose n = 3 to start, so we divide bridges between 8 pools; 
but as we later invent new distribution strategies, we ean inerement n to split the 8 into 16. Sinee a 
bridge ean’t prediet the next bit in its hash, it ean’t antieipate whieh identity key will eorrespond to 
a eertain new pool when the pools are split. Eurther, sinee the bridge authority doesn’t provide any 
feedbaek to the bridge about whieh strategy pool it’s in, an adversary who signs up bridges with the 
goal of filling a eertain pool [11] will be hindered. 

The first distribution strategy (used for the first pool) publishes bridge addresses in a time- 
release fashion. The bridge authority divides the available bridges into partitions, and eaeh partition 
is deterministieally available only in eertain time windows. That is, over the eourse of a given time 
slot (say, an hour), eaeh requester is given a random bridge from within that partition. When the next 
time slot arrives, a new set of bridges from the pool are available for diseovery. Thus some bridge 
address is always available when a new user arrives, but to learn about all bridges the attaeker needs 
to feteh all new addresses at every new time slot. By varying the length of the time slots, we ean 
make it harder for the attaeker to guess when to eheek baek. We expeet these bridges will be the first 
to be bloeked, but they’ll help the system bootstrap until they do get bloeked. Eurther, remember 
that we’re dealing with different bloeking regimes around the world that will progress at different 
rates — so this pool will still be useful to some users even as the arms raees progress. 

The seeond distribution strategy publishes bridge addresses based on the IP address of the re- 
questing user. Speeifieally, the bridge authority will divide the available bridges in the pool into a 
buneh of partitions (as in the first distribution seheme), hash the requester’s IP address with a seeret 
of its own (as in the above alloeation seheme for ereating pools), and give the requester a random 
bridge from the appropriate partition. To raise the bar, we should diseard the last oetet of the IP ad- 
dress before inputting it to the hash funetion, so an attaeker who only eontrols a single “/24” network 
only eounts as one user. A large attaeker like China will still be able to eontrol many addresses, but 
the hassle of establishing eonneetions from eaeh network (or spoofing TCP eonneetions) may still 
slow them down. Similarly, as a speeial ease, we should treat IP addresses that are Tor exit nodes as 
all being on the same network. 

The third strategy eombines the time-based and loeation-based strategies to further eonstrain and 
rate-limit the available bridge addresses. Speeifieally, the bridge address provided in a given time 
slot to a given network loeation is deterministie within the partition, rather than ehosen randomly 
eaeh time from the partition. Thus, repeated requests during that time slot from a given network are 
given the same bridge address as the first request. 

The fourth strategy is based on Cireumventor’s diseovery strategy. The Cireumventor projeet, 
realizing that its adoption will remain limited if it has no eentral eoordination meehanism, has 
started a mailing list to distribute new proxy addresses every few days. Prom experimentation it 
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seems they have eoncluded that sending updates every three or four days is suffieient to stay ahead 
of the current attackers. 

The fifth strategy provides an alternative approach to a mailing list: users provide an email 
address and receive an automated response listing an available bridge address. We could limit one 
response per email address. To further rate limit queries, we could require a CAPTCHA solution 
in each case too. In fact, we wouldn’t need to implement the CAPTCHA on our side: if we only 
deliver bridge addresses to Yahoo or GMail addresses, we can leverage the rate-limiting schemes 
that other parties already impose for account creation. 

The sixth strategy ties in the social network design with public bridges and a reputation system. 
We pick some seeds — trusted people in blocked areas — and give them each a few dozen bridge 
addresses and a few delegation tokens. We run a website next to the bridge authority, where users 
can log in (they connect via Tor, and they don’t need to provide actual identities, just persistent 
pseudonyms). Users can delegate trust to other people they know by giving them a token, which can 
be exchanged for a new account on the website. Accounts in “good standing” then accrue new bridge 
addresses and new tokens. As usual, reputation schemes bring in a host of new complexities [9]: 
how do we decide that an account is in good standing? We could tie reputation to whether the bridges 
they’re told about have been blocked — see Section 7.7 below for initial thoughts on how to discover 
whether bridges have been blocked. We could track reputation between accounts (if you delegate 
to somebody who screws up, it impacts you too), or we could use blinded delegation tokens [5] to 
prevent the website from mapping the seeds’ social network. We put off deeper discussion of the 
social network reputation strategy for future work. 

Pools seven and eight are held in reserve, in case our currently deployed tricks all fail at once 
and the adversary blocks all those bridges — so we can adapt and move to new approaches quickly, 
and have some bridges immediately available for the new schemes. New strategies might be based 
on some other scarce resource, such as relaying traffic for ofhers or ofher proof of energy spenf. (We 
mighf also worry abouf fhe incentives for bridges fhaf sign up and gel allocaled lo Ihe reserve pools: 
will fhey be unhappy fhaf fhey’re nof being used? Buf fhis is a Iransienf problem: if Tor users are 
bridges by defaulf, nobody will mind nof being used yel. See also Section 9.4.) 

7.5 Public bridges with coordinated discovery 

We presenled fhe above discovery slralegies in fhe conlexl of a single bridge directory aufhorily, buf 
in practice we will wanl lo dislribule fhe operations over several bridge aulhorilies — a single poinl 
of failure or allack is a bad move. The firsl answer is to run several independenl bridge direcfory 
aulhorilies, and bridges gravilale to one based on Iheir identify key. The heller answer would be 
some federation of bridge aulhorilies lhal work logelher to provide redundancy bul don’l inlroduce 
new security issues. We could even imagine designs where Ihe bridge aulhorilies have encrypted 
versions of Ihe bridge’s relay descriptors, and Ihe users learn a decryption key lhal Ihey keep private 
when Ihey firsl hear aboul Ihe bridge — Ibis way Ihe bridge aulhorilies would nol be able to learn Ihe 
IP address of Ihe bridges. 

We leave Ibis design question for fulure work. 

7.6 Assessing whether bridges are useful 

Learning whelher a bridge is useful is imporlanl in Ihe bridge aulhorily’s decision to include il in 
responses to blocked users. For example, if we end up wilh a lisl of Ihousands of bridges and only 
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a few dozen of them are reaehable right now, most bloeked users will not end up knowing about 
working bridges. 

There are three eomponents for assessing how useful a bridge is. First, is it reaehable from the 
publie Internet? Seeond, what proportion of the time is it available? Third, is it bloeked in eertain 
jurisdietions? 

The first eomponent ean be tested just as we test reaehability of ordinary Tor relays. Speeifieally, 
the bridges do a self-test — eonneet to themselves via the Tor network — before they are willing to 
publish their deseriptor, to make sure they’re not obviously broken or miseonfigured. Onee the 
bridges publish, the bridge authority also tests reaehability to make sure they’re not eonfused or 
outright lying. 

The seeond eomponent ean be measured and traeked by the bridge authority. By doing periodie 
reaehability tests, we ean get a sense of how often the bridge is available. More eomplex tests will 
involve bandwidth-intensive eheeks to foree the bridge to eommit resourees in order to be eounted 
as available. We need to evaluate how the relationship of uptime pereentage should weigh into our 
ehoiee of whieh bridges to advertise. We leave this to future work. 

The third eomponent is perhaps the triekiest: with many different adversaries out there, how do 
we keep traek of whieh adversaries have bloeked whieh bridges, and how do we learn about new 
bloeks as they oeeur? We examine this problem next. 

7.7 How do we know if a bridge relay has been blocked? 

There are two main meehanisms for testing whether bridges are reaehable from inside eaeh bloeked 
area: aetive testing via users, and passive testing via bridges. 

In the ease of aetive testing, eertain users inside eaeh area sign up as testing relays. The bridge 
authorities ean then use a Blossom-like [15] system to build eireuits through them to eaeh bridge 
and see if it ean establish the eonneetion. But how do we piek the users? If we ask random users 
to do the testing (or if we solieit volunteers from the users), the adversary should sign up so he ean 
enumerate the bridges we test. Indeed, even if we hand-seleet our testers, the adversary might still 
diseover their loeation and monitor their network aetivity to learn bridge addresses. 

Another answer is not to measure direetly, but rather let the bridges report whether they’re 
being used. Speeifieally, bridges should install a GeoIP database sueh as the publie IP-To-Country 
list [18], and then periodieally report to the bridge authorities whieh eountries they’re seeing use 
from. This data would help us traek whieh eountries are making use of the bridge design, and 
ean also let us learn about new steps the adversary has taken in the arms raee. (The eompressed 
GeoIP database is only several hundred kilobytes, and we eould even automate the update proeess 
by serving it from the bridge authorities.) More analysis of this passive reaehability testing design 
is needed to resolve its many edge eases: for example, if a bridge stops seeing use from a eertain 
area, does that mean the bridge is bloeked or does that mean those users are asleep? 

There are many more problems with the general eoneept of deteeting whether bridges are 
bloeked. First, different zones of the Internet are bloeked in different ways, and the aetual firewall 
Jurisdiefions do nof mafeh eounfry borders. Our bridge scheme could help us map ouf fhe fopology 
of fhe censored Infernef, buf fhis is a huge fask. More generally, if a bridge relay isn’f reachable, 
is fhaf because of a nefwork block somewhere, because of a problem af fhe bridge relay, or Jusf a 
temporary oufage somewhere in befween? And lasf, an affacker could poison our bridge dafabase 
by signing up already-blocked bridges. In Ibis case, if we’re sfingy giving ouf bridge addresses. 
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users in that country won’t learn working bridges. 

All of these issues are made more complex when we try to integrate this testing into our social 
network reputation system above. Since in that case we punish or reward users based on whether 
bridges get blocked, the adversary has new attacks to trick or bog down the reputation tracking. 
Indeed, the bridge authority doesn’t even know what zone the blocked user is in, so do we blame 
him for any possible censored zone, or what? 

Clearly more analysis is required. The eventual solution will probably involve a combination of 
passive measurement via GeoIP and active measurement from trusted testers. More generally, we 
can use the passive feedback mechanism to track usage of the bridge network as a whole — which 
would let us respond to attacks and adapt the design, and it would also let the general public track 
the progress of the project. 

7.8 Advantages of deploying all solutions at once 

For once, we’re not in the position of the defender: we don’t have to defend against every possible 
filtering scheme; we just have to defend against at least one. On the flip side, the attacker is forced 
to guess how to allocate his resources to defend against each of these discovery strategies. So by 
deploying all of our strategies at once, we not only increase our chances of finding one that the 
adversary has difficulty blocking, but we actually make all of the strategies more robust in the face 
of an adversary with limited resources. 

8 Security considerations 

8.1 Possession of Tor in oppressed areas 

Many people speculate that installing and using a Tor client in areas with particularly extreme 
firewalls is a high risk — and the risk increases as the firewall gets more restrictive. This notion 
certainly has merit, but there’s a counter pressure as well: as the firewall gets more restrictive, more 
ordinary people behind it end up using Tor for more mainstream activities, such as learning about 
Wall Street prices or looking at pictures of women’s ankles. So as the restrictive firewall pushes up 
the number of Tor users, the “typical” Tor user becomes more mainstream, and therefore mere use 
or possession of the Tor software is not so surprising. 

It’s hard to say which of these pressures will ultimately win out, but we should keep both sides 
of the issue in mind. 

8.2 Observers can tell who is publishing and who is reading 

Tor encrypts traffic on the local network, and it obscures the eventual destination of the communi- 
cation, but it doesn’t do much to obscure the traffic volume. In particular, a user publishing a home 
video will have a different network fingerprint than a user reading an online news article. Based on 
our assumption in Section 2 that users who publish material are in more danger, should we work to 
improve Tor’s security in this situation? 

In the general case this is an extremely challenging task: effective end-to-end traffic confirma- 
tion attacks are known where the adversary observes the origin and the destination of traffic and 
confirms that they are part of the same communication [7, 23]. Related are website fingerprinting 
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attacks, where the adversary downloads a few hundred popular websites, makes a set of ’’finger- 
prints” for eaeh site, and then observes the target Tor elient’s traffie to look for a mateh [4, 20]. 
But ean we do better against a limited adversary who just does eoarse-grained sweeps looking for 
unusually prolifie publishers? 

One answer is for bridge users to automatieally send bursts of padding traffie periodieally. (This 
traffie ean be implemenfed in ferms of long-range drop eells, whieh are already pari of Ihe Tor 
speeifiealion.) Of eourse, eonvineingly simulating an aelual human publishing inleresling eonlenl 
is a diffieull arms raee, bul if may be worlhwhile lo al leasl slarl fhe raee. More researeh remains. 

8.3 Anonymity effects from acting as a bridge relay 

Againsl some aflaeks, relaying Iraffie for ofhers ean improve anonymity. The simplest example is 
an attaeker who owns a small number of Tor relays. He will see a eonneetion from the bridge, but 
he won’t be able to know whether the eonneetion originated there or was relayed from somebody 
else. More generally, the mere uneertainty of whether the traffie originated from that user may be 
helpful. 

There are some eases where it doesn’t seem to help: if an attaeker ean wateh all of the bridge’s 
ineoming and outgoing traffie, then it’s easy to learn whieh eonneetions were relayed and whieh 
started there. (In this ease he still doesn’t know the final destinations unless he is watehing them 
too, but in this ease bridges are no better off than if they were an ordinary elient.) 

There are also some potential downsides to running a bridge. First, while we try to make it 
hard to enumerate all bridges, it’s still possible to learn about some of them, and for some people 
just the faet that they’re running one might signal to an attaeker that they plaee a higher value on 
their anonymity. Seeond, there are some more esoterie attaeks on Tor relays that are not as well- 
understood or well-tested — for example, an attaeker may be able to “observe” whether the bridge is 
sending traffie even if he ean’t aetually wateh its network, by relaying traffie through it and notieing 
ehanges in traffie timing [24]. On the other hand, it may be that limiting the bandwidth the bridge 
is willing to relay will allow this sort of attaeker to determine if it’s being used as a bridge but not 
easily learn whether it is adding traffie of its own. 

We also need to examine how entry guards fit in. Entry guards (a small set of nodes that are 
always used for the first step in a eireuit) help proteet against eertain attaeks where the attaeker 
runs a few Tor relays and waits for the user to ehoose these relays as the beginning and end of her 
eireuit^. If the bloeked user doesn’t use the bridge’s entry guards, then the bridge doesn’t gain as 
mueh eover benefit. On the other hand, what design ehanges are needed for the bloeked user to use 
the bridge’s entry guards without learning what they are (this seems hard), and even if we solve that, 
do they then need to use the guards’ guards and so on down the line? 

It is an open researeh question whether the benefits of running a bridge outweigh the risks. A 
lot of the deeision rests on whieh attaeks the users are most worried about. For most users, we don’t 
think running a bridge relay will be that damaging, and it eould help quite a bit. 

8.4 Trusting local hardware: Internet cafes and LiveCDs 

Assuming that users have their own trusted hardware is not always reasonable. 

^http:// wiki. nor eply.org/ noreply/TheOnionRouter/TorFAQ#EntryGuards 
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For Internet eafe Windows eomputers that let you attaeh your own USB key, a USB-based Tor 
image would be smart. There’s Torpark, and hopefully there will be more thoroughly analyzed 
and trustworthy options down the road. Worries remain about hardware or software keyloggers and 
other spyware, as well as physieal surveillanee. 

If the system lets you boot from a CD or from a USB key, you ean gain a bit more seeurity by 
bringing a privaey LiveCD with you. (This approaeh isn’t foolproof either of eourse, sinee hardware 
keyloggers and physieal surveillanee are still a worry). 

In faet, LiveCDs are also useful if it’s your own hardware, sinee it’s easier to avoid leaving 
private data and logs seattered around the system. 

8.5 The trust chain 

Tor’s “publie key infrastrueture” provides a ehain of trust to let users verify that they’re aetually 
talking to the right relays. There are four pieees to this trust ehain. 

First, when Tor elients are establishing eireuits, at eaeh step they demand that the next Tor relay 
in the path prove knowledge of its private key [10]. This step prevents the first node in the path 
from just spoofing fhe resf of fhe pafh. Seeond, fhe Tor direefory aufhorifies provide a signed lisf 
of relays along wifh fheir publie keys — so unless fhe adversary ean eonfrol a fhreshold of direefory 
aufhorifies, he ean’f friek fhe Tor elienf info using ofher Tor relays. Third, fhe loeafion and keys of 
fhe direefory aufhorifies, in furn, is hard-eoded in fhe Tor souree eode — so as long as fhe user gof a 
genuine version of Tor, he ean know fhaf he is using fhe genuine Tor nefwork. And lasf, fhe souree 
code and ofher packages are signed wifh fhe GPG keys of fhe Tor developers, so users can confirm 
fhaf fhey did in facf download a genuine version of Tor. 

In fhe case of blocked users confacfing bridges and bridge direefory aufhorifies, fhe same logic 
applies in parallel: fhe blocked users fefch informalion from bofh fhe bridge aufhorifies and fhe 
direefory aufhorifies for fhe ‘main’ Tor nefwork, and fhey combine fhis information locally. 

How can a user in an oppressed counfry know fhaf he has fhe correef key fingerprinfs for fhe 
developers? As wifh ofher securify sysfems, if ulfimafely comes down fo human inferaefion. The 
keys are signed by dozens of people around fhe world, and we have fo hope fhaf our users have mef 
enough people in fhe PGP web of frusf fhaf fhey can learn fhe correef keys. For users fhaf aren’f 
conneefed fo fhe global securify communify, fhough, fhis quesfion remains a crifical weakness. 

9 Maintaining reachability 

9.1 How many bridge relays should you know about? 

The sfrafegies described in Seefion 7 falked abouf learning one bridge address af a lime. Bui if mosl 
bridges are ordinary Tor users on cable modem or DSL conneefion, many of Ihem will disappear 
and/or move periodically. How many bridge relays should a blocked user know abouf so fhaf she 
is likely fo have al leasl one reachable al any given poinl? This is already a challenging problem 
if we only consider nafural churn: fhe besl approach is fo see whal bridges we allracf in realily 
and measure fheir churn. We may also need fo factor in a parameter for how quickly bridges gel 
discovered and blocked by fhe allacker; we leave fhis for fulure work afler we have more deploymenl 
experience. 
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A related question is: if the bridge relays ehange IP addresses periodieally, how often does the 
bloeked user need to feteh updates in order to keep from being eut out of the loop? 

Onee we have more experienee and intuition, we should explore teehnieal solutions to this 
problem too. For example, if the diseovery strategies give out k bridge addresses rather than a single 
bridge address, perhaps we ean improve robustness from the user perspeetive without signifieantly 
aiding the adversary. Rather than giving out a new random subset of k addresses at eaeh point, we 
eould bind them together into bridge families, so all users that learn about one member of the bridge 
family are told about the rest as well. 

This seheme may also help defend against attaeks to map the set of bridges. That is, if all 
bloeked users learn a random subset of bridges, the attaeker should learn about a few bridges, 
monitor the eountry-level firewall for eonneetions to them, then wateh those users to see what other 
bridges they use, and repeat. By segmenting the bridge address spaee, we ean limit the exposure of 
other users. 

9.2 Cablemodem users don’t usually provide important websites 

Another attaeker we might be eoneerned about is that the attaeker eould just bloek all DSL and 
eablemodem network addresses, on the theory that they don’t run any important serviees anyway. 
If most of our bridges are on these networks, this attaek eould really hurt. 

The first answer is to aim to get volunteers both from traditionally “eonsumer” networks and 
also from traditionally “produeer” networks. Sinee bridges don’t need to be Tor exit nodes, as we 
improve our usability it seems quite feasible to get a lot of websites helping out. 

The seeond answer (not as praetieal) would be to eneourage more use of eonsumer networks for 
popular and useful Internet serviees. 

A related attaek we might worry about is based on large eountries putting eeonomie pressure 
on eompanies that want to expand their business. For example, what happens if Verizon wants to 
sell serviees in China, and China pressures Verizon to diseourage its users in the free world from 
running bridges? 

9.3 Scanning resistance: making bridges more subtle 

If it’s trivial to verify that a given address is operating as a bridge, and most bridges run on a 
predietable port, then it’s eoneeivable our attaeker eould sean the whole Internet looking for bridges. 
(In faet, he ean just eoneentrate on seanning likely networks like eablemodem and DSL serviees — 
see Seetion 9.2 above for related attaeks.) It would be niee to slow down this attaek. It would be 
even nieer to make it hard to learn whether we’re a bridge without first knowing some seeret. We eall 
this general property scanning resistance, and it goes along with normalizing Tor’s TLS handshake 
and network fingerprint. 

We eould provide a password to the bloeked user, and she (or her Tor elient) provides a noneed 
hash of this password when she eonneets. We’d need to give her an ID key for the bridge too 
(in addition to the IP address and port — see Seetion 6.1), and wait to present the password until 
we’ve finished the TLS handshake, else it would look unusual. If Aliee ean authentieate the bridge 
before she tries to send her password, we ean resist an adversary who pretends to be the bridge and 
launehes a man-in-the-middle attaek to learn the password. But even if she ean’t, we still resist 
against widespread seanning. 
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How should the bridge behave if aeeessed without the eorreet authorization? Perhaps it should 
aet like an uneonfigured HTTPS server (“weleome to the default Apaehe page”), or maybe it should 
mirror and aet like eommon websites, or websites randomly ehosen from Google. 

We might assume that the attaeker ean reeognize HTTPS eonneetions that use self-signed eer- 
tifieates. (This proeess would be resouree-intensive but not out of the realm of possibility.) But 
even in this ease, many popular websites around the Internet use self-signed or just plain broken 
SSL eertiheates. 

9.4 How to motivate people to run bridge relays 

One of the traditional ways to get people to run software that benefits others is to give them moti- 
vation to install it themselves. An often suggested approaeh is to install it as a stunning sereensaver 
so everybody will be pleased to run it. We take a similar approaeh here, by leveraging the faet that 
these users are already interested in proteeting their own Internet traffie, so they will install and run 
the software. 

Eventually, we may be able to make all Tor users beeome bridges if they pass their self- 
reaehability tests — the software and installers need more work on usability first, but we’re making 
progress. 

In the mean time, we ean make a snazzy network graph with Vidalia^ that emphasizes the 
eonneetions the bridge user is eurrently relaying. 

9.5 Publicity attracts attention 

Many people working on this field wanf fo publieize fhe exisfenee and exfenf of eensorship eoneur- 
renfly wifh fhe deploymenf of fheir eireumvenfion soflware. The easy reason for fhis fwo-pronged 
push is fo affrael volunteers for running proxies in fheir sysfems; buf in many eases fheir main 
goal is nol fo foeus on gelling more users signed up, buf ralher fo edueale fhe resl of fhe world 
aboul fhe eensorship. The media also fries fo do ifs pari by broadeasling fhe exisfenee of eaeh new 
eireumvenfion syslem. 

Buf af fhe same lime, fhis publieily allraels fhe allenlion of fhe eensors. We ean slow down fhe 
arms raee by nol allraeling as mueh allenlion, and jusl spreading by word of moulh. If our goal is fo 
esfablish a solid soeial nelwork of bridges and bridge users before fhe adversary gels involved, does 
Ihis exlra allenlion work lo our disadvanlage? 

9.6 The Tor website: how to get the software 

One of Ihe firsl eensoring allaeks againsl a system like ours is lo bloek Ihe website and make Ihe 
soflware ilself hard lo find. Our system should work well onee Ihe user is running an aulhenlie eopy 
of Tor and has found a working bridge, bul lo gel lo lhal poinl we rely on Iheir individual skills and 
ingenuity. 

Righl now, mosl eounlries lhal bloek aeeess lo Tor bloek only Ihe main website and leave mirrors 
and Ihe nelwork ilself unlouehed. Falling baek on word-of-moulh is always a good Iasi resorl, bul 
we should also lake steps lo make sure il’s relatively easy for users lo gel a eopy, sueh as publieizing 
Ihe mirrors more and making eopies available Ihrough olher media. We mighl also mirror Ihe lalesl 

^http://vidalia-project. net/ 
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version of the software on eaeh bridge, so users who hear about an honest bridge ean get a good 
eopy. See Seetion 7.1 for more diseussion. 


10 Next Steps 

Teehnieal solutions won’t solve the whole eensorship problem. After all, the firewalls in plaees 
like China are socially very sueeessful, even if teehnologies and trieks exist to get around them. 
However, having a strong teehnieal solution is still neeessary as one important pieee of the puzzle. 

In this paper, we have shown that Tor provides a great set of building bloeks to start from. The 
next steps are to deploy prototype bridges and bridge authorities, implement some of the proposed 
diseovery strategies, and then observe the system in operation and get more intuition about the 
aetual requirements and adversaries we’re up against. 
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